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Regularized system identification using orthonormal basis functions 

Tianshi Chen and Lennart Ljung 


Abstract —Most of existing results on regularized system 
identification focus on regularized impulse response estima¬ 
tion. Since the impulse response model is a special case of 
orthonormal basis functions, it is interesting to consider if 
it is possible to tackle the regularized system identification 
using more compact orthonormal basis functions. In this paper, 
we explore two possibilities. First, we construct reproducing 
kernel Hilbert space of impulse responses by orthonormal basis 
functions and then use the induced reproducing kernel for the 
regularized impulse response estimation. Second, we extend the 
regularization method from impulse response estimation to the 
more general orthonormal basis functions estimation. For both 
cases, the poles of the basis functions are treated as hyper¬ 
parameters and estimated by empirical Bayes method. Then we 
further show that the former is a special case of the latter, and 
more speciflcally, the former is equivalent to ridge regression 
of the coefficients of the orthonormal basis functions. 

I. INTRODUCTION 

In this paper, we consider the system identification prob¬ 
lem of linear discrete-time, time-invariant and causal sys¬ 
tems, which is described as follows: 

y(t) = g° *u{t)+v{t), t = (1) 

where t = I,-- - ,N are time indices at which the mea¬ 
sured input u{t) and output y{t) are collected, and uniform 
sampling is used and the sampling interval Tg = 1, v{t) is 
the disturbance and for convenience assumed to be a zero 
mean white Gaussian noise, g^{t) with t = 1, 2, • • • , is the 
impulse response, * u{t) is the convolution of (?°(f) and 
u{t) evaluated at the time t. The goal is to estimate g^{t) as 
well as possible based on the collected data {y{t),u{t)}^^i. 

The traditional method to this problem is the maximum 
likelihood/prediction error method (ML/PEM), see e.g., [1], 
[2]. Since v{t) is white, PEM first postulates the so-called 
output error (OE) model structure G{q,9) with 9 € K”: 

y{t) = G{q, 9)u{t) A v{t), (2) 


with 9 = [bi, - ■ ■ - and n = nb + Uf. 

As long as a model structure G{q, 9) is chosen, ML/PEM 
minimizes the prediction error to get the model estimate 

N 

9 = argmm^(j/(f) - G{q,9)u{t)f. (4) 

t=i 

Since the disturbance v{t) in ([T]i is modeled as a stochastic 
process, the estimate 0 is a random variable. Let g denote 
the impulse response of G{q, 9). Then the mean square error 
(MSE) ]E (||5 — tells the quality of the estimate 9. Eor 
the chosen model structure G(q, 9), a key issue to reduce 
the MSE is to find the “right” model complexity: it shall be 
parsimonious but capable to describe the data. Traditionally, 
it is suggested to use the model structure selection criterion, 
like AIC, BIC, to find a suitable n, the dimension of 9. 
However, this way may not work well for short and noisy 
data records. 

The model structure © has a very general form and 
includes many widely used model structures as special 
cases. One attractive class of model structures among many 
others is the linear-in-parameter model structures which can 
considerably simplify the optimization in ©. The most well- 
known instance is perhaps the finite impulse response (EIR) 
model structure 

n 

Giq,9) = '^gkq~'", 9 = [gi, ■ ■ ■ , g^f. (5) 

k=l 

However, the EIR model is often criticized for its large 
variance error when high order EIR models have to be used 
to describe “slow” systems with either slow dynamics or with 
high sampling rate. A more compact model structure is the 
linear combination of basis functions: 

m 

G{q,9) = '^gkFk{q),9 = [pi,--- Juff 

k=l 

(6) 


where q is the forward-shift operator and qu{t) = u{t + 1) 
and 

fli - 2M B{q) = biq-^+ --- + bn^,q~'^'’ 

F{qy F(g) = l + /ig-i+ ••• + /„, g-"/ 

(3) 
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where n = m + Uf and Fk{q) = q^~^/F{q), k = 1, - ■ ■ ,m 

are pre-specified basis functions. The model structure © has 
attracted a lot of interests in the last two decades, see e.g., 
[3] and the references therein. Two widely known special 
cases of © are the Laguerre model [4] and the Kautz model 
[5]. The Laguerre model takes the form 


Giq,9) =^gk 

k^l 

s = [gi,- 


\/(l - (B) f l-aq 


q — a 


5 Qm-) 


k-l 


q — a 

|a| < 1 


(7) 


where a is pole of the Laguerre model and has to be 
pre-specified according to the a priori information on the 






time constant of the underlying system [5]. Since the basis 
functions have infinite impulse responses, there is often 
no problem of describing “slow” systems with relatively 
small number of basis functions in (01. While the use of 
orthonormal basis functions (0 has been discussed a lot, 
still open problems are 

1) how to choose suitable poles for the basis function? 

2) how many basis functions shall be used? 

There is another way to reduce the MSE, i.e, by using 
regularization. However, this way has not been investigated 
rigorously in system identification until the seminal work [6]. 
Instead of trimming the model complexity of G{q, 6) in terms 
of n, it was suggested to use a well-tuned regularization 
to regularize the impulse response to reduce the MSE [7]. 
Since then the followup results in [8], [7], [9], [10], [11] 
and the recent survey paper [12] show that the regularized 
high order EIR model (or high order ARX model) can 
lead to good model estimates in terms of accuracy and 
robustness. In this paper, we will make use of orthonormal 
basis functions for the regularized system identification and 
we will consider two cases. Eirst, we construct reproducing 
kernel Hilbert space of impulse responses by orthonormal 
basis functions and then use the induced reproducing kernel 
for the regularized impulse response estimation. Second, 
we extend the regularization method from impulse response 
estimation to the more general orthonormal basis functions 
estimation. Eor both cases, the poles of the basis functions 
are treated as hyper-parameters and estimated by empirical 
Bayes method. Then we further show that the former is a 
special case of the latter, and more specifically, the former 
is equivalent to ridge regression of the coefficients of the 
orthonormal basis functions. 

II. Regularized least squares method 

Consider a linear regression model 

Yn = + Vn ( 8 ) 

where Y/v G is the data, is the regression 

matrix, 9 G R" is the parameter to be estimated, and Vjv is 
the disturbance and assumed to be white Gaussian distributed 
as Af{0,a'^I]y) with In being the A^-dimensional identity 
matrix. We estimate 9 by minimizing the regularized least 
squares (RES) criterion 

9 = argmin HYat — + cr^9^K{a)~^9 (9a) 

6 

= + a^lN)-^YN. (9b) 

Here, K(a) Y is called the regularization matrix (also 
often called the kernel matrix) and defined through the kernel 
function K{k,j', 0 ') as Kkj{a) = K(k,j',o), where a is a 
vector of tuning parameters and called hyper-parameter. 

There are two key issues: 

1) how to parameterize the kernel function K{k,j',ct) 
which is often simply written as K(a) below? 

'when K(o) is singular, has to be interpreted in the way discussed 
in [9, Remark 2.1], 


2) how to tune the hyper-parameter a? 

Eor 1), it is worth to note [7, Theorem 1] that the optimal 
regularization matrix in the sense of minimizing the MSE 
matrix of 9 with respect to (the true value of 9) exists and 
takes the form of = 9o9q. While it cannot be applied 
in practice, it gives a guideline to design the regularization 
matrix: let it mimic the behavior of Apparently, if 

some prior information is known for 9o, it shall be used in 
the design of a suitable kernel function K{a). 

Eor 2), the current most effective method is to embed 
the regularization in the Bayesian framework and invoke 
the empirical Bayes method, i.e., the marginal likelihood 
maximization. Assume 9 ~ A/"(0,K(a)). Then we estimate 
a by maximizing 

a = argmaxp(Y}v|Q:) 

Ol 

= argmin 

Ol 

-f o^InY^^Yn + log det($ 7 vK(a)<l>^ -f YIn) (10) 

A. Regularized impulse response estimation 

Eor regularized impulse response estimation, we consider 
the model © with n = oo. The system © can then be 
written as a linear regression model © with the ith row 
of Yn,Vn and being y{i),v(i) and ip{i) = [it((z — 
1)), ■ • • , u{{i — c)o))]^ where the unknown inputs u{t) are 
set to zero, and 0 = [ 51 , 52 , •'' ; G M°°. So we can use the 
RES method to estimate the impulse response. The remaining 
issue is the design of a suitable kernel function Ar(a). Several 
choices have been suggested in [6], [8], [7]. Eor example, 
the diagonal-correlated (DC) kernel and its special case, the 
tuned-correlated (TC) kernel are defined as: 

DC K‘^^{k, j-a) = cY^+Ynp\k-3\^a=[cXpY (11) 

TC =cmin(A^A^),a= [c A]"^ (12) 

where the TC kernel has also been introduced as the first- 
order stable spline (SS) kernel, see [13], [14] for discussions. 
In practice, we however cannot handle infinite impulse re¬ 
sponse and we have to truncate the infinite impulse response 
to a finite one, i.e., the EIR model. In this case, we refer this 
method as the regularized EIR model method in [7]. 

III. Regularized impulse response estimation 
WITH kernel structure CONSTRUCTED BY 
ORTHONORMAL BASIS FUNCTIONS 

In the following, we consider a different kernel which is 
constructed by use of the orthonormal basis functions. Before 
proceeding to the details, recall that the RES criterion (i9al l 
for regularized impulse response estimation has a function 
estimation interpretation. The RES (i9al l is equivalent to 

N 

d = aig min ^ |y(f) - i? * u(f)|^ -f cr2||t?||^ 

(13) 

where = 9t with 9t being the fth element of 0 C IR.°° is 
the impulse response, and 'HK(a) is the reproducing kernel 


Hilbert space (RKHS) induced by the kernel K{a). Then the 
RLS estimate is also the function estimate that minimizes 
O within the RKHS T-LKia)- This implies that when we 
trim the kernel K (a), we equivalently trim the function space 
where we search for the impulse response. 

The above observation gives us another idea to design the 
kernel structure; we can first construct a RKHS space of 
suitable impulse responses and this space then uniquely de¬ 
termines a reproducing kernel according to Moore-Aronszajn 
Theorem, see e.g., [15]. Note that looking for a RKHS 
space of impulse responses in time domain is equivalent to 
looking for a RKHS space of transfer functions in frequency 
domain. In system identification community, the idea of 
approximating or expressing the transfer function of the 
underlying system by expanding it in terms of orthogonal 
basis functions have been well studied, see e.g., [3], [16], 
[17], [18], [19] and the references therein. It is natural to 
ask if the space spanned by orthogonal basis functions could 
be a candidate for our use. To answer this question, we 
have to check if this space is a RKHS, and if it is, what its 
reproducing kernel is. Fortunately, there are standard answers 
to these questions. 


A. Transfer function space spanned by the orthonormal basis 
functions on the unit circle [20], [18] 

Following [20], let {ak}^Q with \ak\ < 1 be an arbitrary 
sequence of complex numbers which may appear as numbers 
of finite or even infinite multiplicity. Given a 

system of functions is defined as 


Men 


y/l-|«oP 

1 — aoe®“ ’ 


Men 


- Wn TT Qfc - \ak\ 2 

k—0 

(14) 


where Uj means the complex conjugate of aj, uj € [—tt tt), 
and = —1 for a,- = 0. Such a system is called 

the Malmquist system. It is well-known that the Malmquist 
system is orthonormal on the unit circle in the sense that 


1 




Sk,j 


0 ky^j 

1 k = j 


(15) 


We are interested in the space spanned by a subset of the 
Malmquist system dll. It can be shown see e.g., [18] that 
the space spanned by { 00 ( 6 *“^), (j)i{e'“^), ■ ■ ■ , with 

the inner product defined as 

(/,5) = ^y fiengiendu} (16) 

is a RKHS space with the reproducing kernel 

m 

= E MenM^) (17) 

k=0 

which we will refer below as the (m-l-l)th order orthonormal 
basis (OB) kernel in frequency domain. 


Setting in the following 


Bm+nn = n 

A;=0 


gfc - \ak 
1 - ttfc 


( 18 ) 


where same as before, = — 1 for at = 0. Then 

_ ak \ak\ 

the kernel (UTb has a simplified expression [20, Lemma 5] 


T^ob t „ibj \ 1 Bm+l{el^)BTn+l{e'^‘^ ) 

KfreM ,e ) = - ^ - (19) 

which is also known as the Christoffel-Darboux (C-D) for¬ 
mula, see e.g., [18, Theorem 3.1]. The C-D formula is useful 
to simplify the construction of the kernel matrix (i.e., the 
regularization matrix). 


B. The Laguerre kernel 

The simplest case of OB kernels ([TOl l is perhaps the case 
where ai = a for i = 0,1, ■ ■ ■ ,m with a G R and |a| < 1. 
In this case, the OB kernel (ITOl l becomes 







) 

l-a" 


UJ Uj' 

UJ = Uj' 

( 20 ) 


which we will refer below as the {m -I- l)th order Laguerre 
kernel. This is because it is the reproducing kernel of the 
RKHS space spanned by the first m + 1 Laguerre rational 
basis function of © in the frequency domain. For the 
Laguerre kernel (i20l i. there is only one hyper-parameter a, the 
real pole of the Laguerre basis functions, which is convenient 
to estimate for the hyper-parameter estimation. 


C. Regularized frequency response estimation 

Since OB kernels are defined in frequency domain, one 
may wonder if it is possible to work in frequency domain 
directly without going back to the time domain. The answer 
is affirmative. Recently, we have derived the dual of the reg¬ 
ularized impulse response estimation in frequency domain, 
i.e., the regularized frequency response estimation, see [21] 
for details. By using the implementation in [21], we can 
derive the regularized frequency response with OB kernels. 


IV. Regularized orthonormal basis functions 

ESTIMATION 

It is worth to note that the hyper-parameters of OB kernels 
([T9l l are {a^j^Q which are the poles of the basis functions. 
So tuning OB kernels is equivalent to tuning the location 
of the poles of its underlying basis functions. This finding 
motivates another way of using orthonormal basis functions 
for regularized system identification: 

1) formulate the orthonormal basis functions based model 
as a linear regression model; 

2) treat the poles of the orthonormal basis functions as 
hyper-parameters and design a suitable kernel for the 
coefficients of the orthonormal basis functions; 

3) estimate the hyper-parameter by empirical Bayes 
method and then obtain the regularized orthonormal 
basis functions by using RLS method. 
















We first formulate (H with the linear combination of 
orthonormal basis functions ® as a linear regression model: 

m 

y{t) = Y. Qk^k * u(t) + v(t) (21) 

fc=l 

where g = [gi, • • • ‘Pk{t) is the impulse response of 

Fk{q) in (01. Let p be the vector consisting of all poles of 
Fk{q) = 0, k = 1, ■ ■ ■ ,m. Then the impulse response (pk{t), 
k = 1, - ■ ■ ,m depend on p. 

The feature of OB kernels that their hyper-parameters are 
poles of the basis functions motivates to treat poles of the 
basis functions as hyper-parameters and estimate them by the 
empirical Bayes method. It should be noted that this idea has 
also been figured out independently by Darwish, Toth, and 
Van den Hof in [22]. For now we assume that p is known 
and then we can estimate g by minimizing the RLS criterion 

N m 

g = argmin^ \y{t) 

® t=i fe=i 

( 22 ) 

= argmin \\Yn - ‘^N{p)g\\l + a'^g'^K{a)~^g (23) 

g 

where K(a) is the regularization matrix on the coefficients 
of the orthonormal basis functions, and ^n{p) is 
the regression matrix that can be formed in a natural way. 

As discussed in Section m it is a key issue to design a 
suitable kernel structure, which relies on the prior knowledge 
that we know about the coefficients of the orthonormal 
basis functions. Apparently, this issue depends on what 
orthonormal basis functions we use. For illustration, we 
consider the Laguerre model dTji as an example below. 

The assumptions on the Laguerre coefficients {gk}'^i and 
the convergence property of Laguerre model, i.e, how fast (|7]i 
converges as m —>• oo has been discussed, see e.g., [4]. It is 
suggested in [4] to assume the absolute convergence of the 
sum of the Laguerre coefficients {gk}'^i, i-e., 

OO 

^|5fc|<oo (24) 

k^l 

If we treat {gk}’^i as the impulse response of a linear 
system, then the above assumption (l24li says nothing but the 
linear system is stable. This observation implies that the ker¬ 
nels introduced for regularized impulse response estimation, 
the SS, TC and DC kernels can be candidates to regularize 
the Laguerre coefficients {gk}^i- 

Remark 4.1: As pointed out in [4], the convergence rate 
of the Laguerre model (|7]i can be slow, e.g., if the system 
has poles close to the unit circle or has high resonant poles. 
In this case, one can try the adapted DC kernel as follows: 

K<^<^%k,j) = c\{k+j)p\’^-^\ (25) 


For more general orthonormal basis functions, we can 
always hrst try the SS, TC and DC kernels if (l24l i is assumed. 
If they do not work so well, we shall spend more efforts 
on investigating the prior knowledge or assumption on the 
coefficients of orthonormal basis functions and design a 
suitable kernel structure accordingly. 

Now it remains to estimate the hyper-parameters: the pole 
p of the orthonormal basis functions and the hyper-parameter 
a used to parameterize the kernel structure. Assume 9 ~ 
A/’(0,K(a)). Then from (l2Tt we have 

p,a = arg max p(YAr Ip, a) 

p,cx 

= argmin Y^($Ar(p)K(a)$Ar(p)^ 

p,cx 

+ cr^/Ar)“^rjv -h logdet($w(p)K(a)4>Ar(p)^ -h cr^/jv) 

Finally, solving (l22l l or (|2^ by replacing p, a with p, ol yields 
the regularized orthonormal basis function estimate. 


V. Regularized impulse response estimation with 

THE OB KERNEL IS A SPECIAL CASE OF REGULARIZED 
ORTHONORMAL BASIS FUNCTIONS ESTIMATION 


In this section, we show that the regularized impulse 
response estimation with the OB kernel (ITTI i is a special case 
of the regularized orthonormal basis functions estimation. 
More specifically, it is equivalent to ridge regression of the 
coefficients of the orthonormal basis functions, see e.g., [23]. 

To show this, it is more convenient to go back to time 
domain. For the orthonormal basis functions {</i’fc(e“)}^Q 
in frequency domain, we can dehne their correspondents 
in time domain. Here, is the impulse 

response of (j)k{F‘^) and moreover, we have 

=HMt)}, Mt) = (26) 


where F and denote the discrete time Fourier transform 
and its inverse transform, respectively. 

Then it is straightforward to verify by using (O and 
Cauchy’s integral formula that {pfc(f)}^o orthonormal 
in the sense that 


Y Pk{t)‘Pj{t) = 6k,j = IJ ^ 

t=o ■' 


(27) 


Moreover, the space spanned by {(po(f), -‘ i Pmit)} 

with the inner product 


{f,h)=Yfit)Ht) (28) 


is a RKHS space with the reproducing kernel 


fc=0 


where A(-) is a nonnegative function such that A(-) decays 
slower than the exponential function and is a valid 

kernel. Or one can choose to use the other regularized 
orthonormal basis functions, such as the Kautz model in [5] 
to handle the case where the system has high resonant poles. 


which we will refer below as the (m -I- l)th order OB kernel 
in time domain. Apparently, the OB kernel (i29l l in time 
domain and the OB kernel dnii in frequency domain are 
related through Fourier transform, e.g., ) = 


Now consider (fTST l with the kernel K{a) replaced by the 

OB kernel (l29l l. The RKHS 'HK{a) becomes 

(a) = span of ifo (f) ,ipm (t) 

m 

= = ^9k^k{t),9k e (30) 

k^l 

and moreover, 

m 

(31) 

k^l 

Therefore, (fT3l) is equivalent to 

N m 

g = argmin^ \y{t) - ^gwk *u{t)\'^ + a-'^\\g\\l (32) 

® t=i k=i 

where the regularization ||p ||2 is a ridge regression of g. 

We have the following interesting observations: 

1) The regularized impulse response estimation with the 
OB kernel (l29l) (equivalently, (fTTl i) is equivalent to a 
ridge regression of the coefficients of the orthonormal 
basis functions (I32ll, which is a special case of the 
regularized orthonormal basis functions estimation (l22l) 
with the regularization matrix K(q;) = Im- 

2) For the Laguerre kernel (l20l) . the ridge regression ||p|| 2 , 
i.e., the kernel K{k,j;a) = aSkj cannot guarantee 
the absolute convergence of the sum of Laguerre model 
coefficients, i.e., (l24l) . Since the kernel K{k,j-,a) = 
aSk,j does not reflect our prior knowledge, it is not 
a good kernel and the regularized impulse response 
estimation with the OB kernel (1291) will not work well 
for high order OB kernel. This claim will be verified 
by numerical simulations shortly. 


kernel (fTSl i and the diagonal (DI) kernel K{k,j; a) = 
diag(a, are used to regularize the La¬ 

guerre coefficients. The results are represented as 
RLAG-TC and RLAG-DI, respectively. 

2) LS-LAG: the Laguerre basis function estimation with 
least squares method. The estimate of the pole of 
the LaguetTe model is obtained from RLAG-TC and 
then the least squares method is used to estimate the 
LaguetTe coefficients without regularization. 

3) RFIR-TCjRFIR-LAG: the regularized impulse re¬ 
sponse estimation. The order of the FIR model (|5]l is 
chosen to be 125 and the unknown input are set to zero 
when forming the regression matrix. The TC kernel 
(fTSTi and the Laguerre kernel (l20l i are used to regular¬ 
ize the impulse response coefficients. The results are 
represented as RFIR-TC and RFIR-LAG, respectively. 
As shown in Section El RFIR-LAG is equivalent to 
regularized LaguetTe basis functions estimation with 
the scaled identity kernel K{k^j\ a) = aSkj- 


C. Model fit 

To measure the performance of the examined methods, 
we compare the impulse response of the estimated model 
with that of the true system: we let gk and g^ to denote the 
A:th coefficient of the former and the latter impulse response, 
respectively. Then the model fit is defined as 



Xl=i\9°k-9°\^\ j’ 


1 


125 




(33) 


D. Simulation result 

The average model fit over the corresponding data collec¬ 
tions are shown in the table below. 


VI. Numerical simulation 

A. Data-bank 

For this preliminary work, we use a portion of the data¬ 
bank in [7, Section 2], which consists of 4 data collections: 
« SlDl: fast systems, data sets with N = 500, SNR=10 
« S2D1: slow systems, data sets with N = 500, SNR=10 

• S1D2: fast systems, data sets with N = 375, SNR=1 

• S2D2: slow systems, data sets with N = 375, SNR=1 
Each collection contains 250 randomly generated 30th order 
discrete-time systems and data sets. The fast systems have 
all poles inside the circle with center at the origin and radius 
0.95 and the slow systems have at least one pole outside this 
circle. The signal to noise ratio (SNR) is defined as the ratio 
of the variance of the noise-free output over the variance of 
the white Gaussian noise. In all cases the input is Gaussian 
random signal with unit variance. For more details regarding 
the data bank, see [7, Section 2]. 

B. Examined methods 

We examine three methods: 

1) RLAG-TC,RLAG-DI: the regularized Laguerre ba¬ 
sis functions estimation. The Laguerre model with 
orders m = 10,20,30,40 are considered. The TC 


LS-LAG 

SlDl 

S1D2 

S2D1 

S2D2 

TO = 10 

80.2 

70.2 

73.6 

59.5 

TO = 20 

88.8 

68.6 

82.1 

56.3 

II 

CO 

O 

90.0 

62.8 

84.0 

38.1 

TO = 40 

88.7 

56.6 

84.1 

-4.9 

RFIR-TC 

SlDl 

S1D2 

S2D1 

S2D2 

n = 125 

91.4 

76.1 

81.2 

66.1 

RFIR-LAG 

SlDl 

S1D2 

S2D1 

S2D2 

TO = 10 

80.1 

69.6 

72.0 

60.6 

TO = 20 

88.3 

68.5 

80.4 

61.3 

TO = 30 

89.1 

64.7 

82.5 

59.9 

TO = 40 

88.1 

62.6 

83.1 

58.4 

RLAG-TC 

SlDl 

S1D2 

S2D1 

S2D2 

TO = 10 

80.2 

71.3 

72.9 

63.0 

TO = 20 

89.2 

75.3 

81.9 

67.8 

o 

CO 

II 

91.3 

76.1 

85.2 

69.2 

TO = 40 

91.8 

76.3 

86.8 

70.1 

RLAG-DI 

SlDl 

S1D2 

S2D1 

S2D2 

TO = 10 

80.4 

71.8 

73.2 

64.0 

TO = 20 

89.2 

75.7 

82.3 

68.6 

II 

CO 

o 

91.2 

76.0 

85.9 

69.7 

TO = 40 

91.6 

76.1 

86.8 

70.0 



















E. Findings 


References 


First, RLAG can achieve comparable performance as RFIR 
but with more compact model stucture in terms of the number 
of basis functions. In particular, for slow systems S2D1 and 
S2D2, RLAG has clearly better performance (about 5%) than 
RFIR. 

Second, for RLAG, RLAG-DI has very close performance 
as RLAG-TC, which is different from RFIR studied in [7] 
where RFIR-DI is much worse than RFIR-TC. For RFIR, TC 
kernel is clearly a better kernel than the DI kernel because on 
the one hand, the impulse response is often smooth and on 
the other hand, the latter does not assume smoothness. How¬ 
ever, for RLAG, no prior knowledge regarding the Laguerre 
coefficients is available except the absolute convergence of 
the sum of the Laguerre coefficients (l24l i. Both TC kernel and 
DI kernel can guarantee (l24l l. The simulation results indicate 
that to assume independence between neighboring Laguerre 
coefficients is not a bad choice for the tested data bank. 

Third, RFIR-LAG has worse performance than RLAG. 
This coincides with our observation in Section [3 that the 
ridge regression is not a suitable regularization for Laguerre 
basis functions. The influence of the unsuitable regularization 
is enlarged for high order Laguerre kernels and cause larger 
difference in the performance. 

Fourth, RLAG has better performance than LS-LAG shows 
the importance of the regularization on the Laguerre coeffi¬ 
cients. 

VII. Conclusion and future works 

In this preliminary work, we have explored the possi¬ 
bilities to tackle regularized system identification problems 
using orthonormal basis functions. 

Interestingly, the idea of constructing kernels using or¬ 
thonormal basis functions for regularized impulse response 
estimation turns out to be a special case of the regularized 
orthonormal basis functions estimation, and moreover, it 
is equivalent to ridge regression of the coefficients of the 
orthonormal basis functions. 

The idea of regularizing the orthonormal basis functions 
works fine but still requires more careful investigation. Due 
to the space limitation we have mainly studied the regularized 
Laguerre basis functions as an instance, but the proposed idea 
applies to the more general orthonormal basis functions, e.g., 
the Kautz model. Such extensions are necessary and will be 
examined in our future works because it is known that the 
convergence rate of the Laguerre model is slow when the 
system has poles close to the unit circle. Another interesting 
topic is regarding how to design a suitable kernel for the 
coefficients of the orthonormal basis functions. 
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